perm filename 106A14[1,RWF]1 blob
sn#728157 filedate 1983-10-26 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00009 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 _String Types_
C00003 00003 Things we might want a computer to do with strings:
C00005 00004 We can do with strings all the operations implicit in an array of
C00012 00005 Exercise - Dynamic Programming
C00016 00006 For section in Debugging
C00019 00007 Reading and printing of decimal numbers is not a primitive operatin on most
C00026 00008 _Algorithms_
C00033 00009 Algorithms 2
C00037 ENDMK
C⊗;
_String Types_
A string type is a type name that has been defined to be
PACKED ARRAY [1..N] OF CHAR
for some integer constant N. In this text, we will include the value of
N in the names we use for string types; we might define
TYPE STRING8O = PACKED ARRAY[1..80] OF CHAR;
Such types can be used as types of program variables
VAR LINE1, LINE2: STRING80;
or of subprogram parameters
PROCEDURE CENTER (INPUT LINE: STRING80; VAR OUTPUTLINE: STRING80);
They may not, unfortunately, be used as types of function results in Standard
Pascal.
Things we might want a computer to do with strings:
(1) Matching. Let's say that somewhere in a large manuscript I remember
using the phrase "begging the question". I've learned that I misused the
term, so I want to find that place. Probably to find "begging" will suffice.
(2) Substitution. I have developed a program using shorthand names for the
variables. Now I wnat to change GW to GROSSWAGES throughout the program.
(3) Alphabetization. I have an array of strings, each containing the name,
address, and telephone number of one employee. I want to arrange them in
alphabetical order by name, to print a company directory.
(4) Formatting. I want to rearrange my manuscript so that paragraphs are
separated by blank lines, and lines are stretched by insertion of spaces to
give straight margins on left and right. I also want to center the chapter
headings. Or I want to consistently space and indent a Pascal program.
(5) Interpretation. I want my program to read phrases like "three thousand
two hundred eighty" on a check, or "gale force winds" on a meteorological
report and translate them into numerical equivalents.
(6) Translation. I want to provide a word-by-word translation from Spanish
to English.
We can do with strings all the operations implicit in an array of
characters. To find if a word ends in a vowel we might use the program
fragment:
...
...
VAR WORD: STRING20;
C: CHAR;
...
I:=1;
WHILE WORD[I+1] <> ` ' DO I:=I+1;
(*NOW WORD[I+1] IS BLANK*)
C:=WORD[I];
IF (C=`A') OR ... OR (C=`U') THEN S1 ELSE S2
Additionally, Pascal includes some operations automatically defined for string
types. If X and Y are of the _same_ string type, they can be tested for
equality and for alphabetical order:
VAR X,Y: STRING20;
...
...
IF X=Y THEN WRITE ('SAME WORD')
ELSE IF X<Y THEN WRITE ('X COMES BEFORE Y')
ELSE WRITE ('Y COMES BEFORE X')
The tests of alphabetical order are based on the ordering of the
characters, so the results are probably not what you want if the strings
mix capital with lower case letters. In the ASCII code, for example,
'zebra___' comes before 'AARDVARK', because a lower case letter z comes
before an upper case letter A.
You can read to a string variable:
READ(WORD)
reads one line of an input text file, adds spaces on the right if necessary,
and assigns the resulting string to WORD. If the line is longer than WORD,
the command reads only enough to fill WORD.
The string constants belong to the string types of the same length; you can
assign
WORD:='ABCDEFGHIJKLMNOPQRST';
if WORD is of type STRING20.
You can print a string.
WRITELN(WORD) prints ABCD...RST
The list above contains all of the builtin operations in strings in Pascal.
Comparing it to the long list of functions, operators, and relations on
numbers, you can see that Pascal was designed more to work on numbers than
on strings. Many Pascal implementations, though, have extensions ("string
packages") to do more, and programmers can design their own procedures to
do common string manipulations. Here is one to capitalize letters in
string LINE1, putting the result in LINE2:
PROCEDURE CAPITALIZE(LINE1:STRING80;VAR LINE2: STRING80);
VAR I:INTEGER;
BEGIN
FOR I:=1 TO 80 DO
IF LINE1[I] IN [`a'..`z'] THEN
LINE2[I]:=CHR(LINE1[I] - ORD(`a') + ORD(`A')
ELSE LINE2[I]:=LINE1[I]
END;
***New page***
CS106 notes. Enter just before "Numerical Precision"
What to do Until the Doctor Comes
We are talking about a doctor of philosophy specializing in numerical
analysis. In everyday life, everyone should know a little about medicine -
not so that they can prescribe antibiotics for themselves, but so that they
can recognize the symptons that need to be checked out by a professional.
Just as our bodies are inhabited by micro-organisms that normally are
harmless, but occasionally are lethal, our programs are full of
micro-errors arising from the limited precision of computer arithmetic.
Occasionally these small errors during a computation lead to a
catastrophically large error in the result. This is the professional
domain of the numerical analyst. Like the proctologist's, his job is not
to everyone's taste.
We are not going to try to make junior numerical analysts out of you, but
we are going to show you what problems to look for; a competent programmer
must be aware of these danges. According to Prof. Wm. Kahan, several major
passenger aircraft are plagued with aerodynamic problems arising from
faulty numerical analysis in the design process.
*********
Write More.
***New page***
Numerical Precision: Exercise
The evaluation of
1-1/2+1/3-1/4+....+1/9999-1/10000
was carried out by these four program fragments, with the results shown
beside them. The correct result is .
S:=0.0;
FOR I:=1 TO 5000 DO
S:=(S+1/(2*I-1))-1/(2*I)
S:=0.0;
FOR I:= 5OOO DOWNTO 1 DO
S:=(S+1/(2*I-1))-1/(2*I)
SPLUS :=0.0;SMINUS:=0.0;
FOR I:=1 TO 5000 DO
BEGIN
SPLUS:=SPLUS + 1/(2*I-1);
SMINUS:=SMINUS + 1/(2*I)
END;
S:=SPLUS-SMINUS
SPLUS:=0.0; SMINUS:=0.0;
FOR I:=5000 DOWNTO 1 DO
BEGIN
SPLUS:=SPLUS + 1/(2*I-1);
SMINUS:=SMINUS + 1/(2*I)
END;
S:=SPLUS-SMINUS
The calcualtion was done in decimal arithmetic, keeping the six most
significant digits of each number (that is, numbers were truncated, not
rounded, to six digits).
The errors differ enormously. Explain why the errors are the approximate
size they are. Notice that the first program is the obvious one to write,
and has much poorer precision than the others.
Exercise - Dynamic Programming
Some positions that arise in the game of backgammon become simple races,
where the first player to roll a large enough cumulative total on the dice
wins the game. A player rolls two dice; if the numbers on the dice are
different, the player gets the sum, but if the numbers are equal, he gets
twice the sum; 1 & 2 gives the player 3, while 6 & 6 gives him 24.
Calculate the average number of turns required for a player to accumulate a
total of 100. Warning: don't fall into the trap of dividing 100 by the
average roll (8 1/6).
Harder Exercise - Dynamic Programming
Calculate the winning chance of the player whose turn is next in a
backgammon race (see previous problem), if the first player needs a total
of 38, and the second a total of 40.
Hardest Exercise - Dynamic Programming (Suitable for term project)
In backgammon as played for money, at any one time one (or initially both)
of the players has the right to double the stakes before rolling the dice,
if he thinks that on the average he will win the most money by doing so.
The opponent may either give up the game, paying the current stake, or
agree to let the game continue for twice the current stake, in which case
the first player proceeds with his dice roll. When a player uses the right
to double, he loses it and the other player gets it, so doubling alternates
between the players.
In the situation of the previous problem, if the first player has doubled
earlier in the game, so that the current stake is 2 units, on the average
what will be the first player's net winnings, counting losses as negative
winnings? Assume that each player offers and accepts doubles to maximize
average net winnings. To make the problem harder, assume that neither
player has doubled; find the first player's average winnings, and determine
whether he should double before his first roll of the dice.
For section in Debugging
Do you remember the dictionary search algorithm? Look back at page if
you don't (and why don't you have every word memorized? If you appreciated
how hard I worked writing this book, you would study harder, you lazy
good-for-nothing! Do your parents know that they spend $8000 a year so
that you can hang out at the Oasis, the Dutch Goose, and Zots?) Well,
anyway, here is a Pascal algorithm for it, where DICT[I] is a string, the
I'th word in an N-word dictionary.
READ(WORD);
LEFT:=1;
RIGHT:=N;
MIDDLE:=(LEFT+RIGHT) DIV 2
WHILE WORD <> DICT[MIDDLE] DO
BEGIN
IF WORD < DICT[MIDDLE] THEN
RIGHT:=MIDDLE
ELSE LEFT:=MIDDLE;
MIDDLE:=(LEFT+RIGHT) DIV 2
END;
WRITE('WORD IS IN LOCATION', MIDDLE)
_EXERCISE_
The program above does not work. Find what is wrong and repair it.
(Hint: What if it is a two word dictionary?)
(Second Hint: What is the invariant supposed to be?)
_EXERCISE_
The program above is designed on the assumption that WORD is in the
dictionary. Modify it so that it will announce that a word is not in the
dictionary.
(Hint: Assume it is a two-word dictionary, containing `B' and `D'. Make
sure it does what you want, whether WORD is `A', `B', `C', `D', or `E'.
Reading and printing of decimal numbers is not a primitive operatin on most
computers, so Pascal provides standard subalgorithms to handle most of the
common situations that arise. Now and then, however, a problem comes up
where the built-in mechanisms are inadequate, and the programmer must
explicitly handle the individual decimal digits of numbers moving to or
from files. Suppose we have a number N, known to be greater than 100,000
and less than 1,000,000, which we want to print punctuated with a comma
after the thousands digit, like `123,456'. The WRITE command for numbers
does not provide for this form, so the program will have to execute
WRITE(`,), preceded by something that writes out (say) 123, and followed by
something that writes 456.
We can calculate the numbers to be printed before and after the comma as N
DIV 1000 and N mod 1000 repectively, and if N is 123456, the command
WRITE(N DIV 1000:3,`,',N MOD 1000:3) prints
123,456
as desired. If N is 123004, however, we get an unpleasant surprise:
123, 4
because the WRITE command never prints leading zeroes. For the part of the
number that follows the comma, at least, we must work out the individual
digits, and print them separately. To get the tens digit requires two
steps. Taking N MOD 100 gives us the numerical value of the two right hand
digits (56 and 4 in the examples above), after which a division by 10 gives
the tens digit. To print a six digit integer N withut supressing leading
zeroes can be done with this subprogram:
D:=1000000;
FOR I:= 1 TO 6 DO
BEGIN
D:=D DIV 10; (*FIRST TIME 100000, LAST TIME 1*)
Q:=N DIV D; (*NEXT DIGIT ON N*)
WRITE(Q:1);
N:=N MOD D
END
where a comma can be inserted by inserting IF I=3 THEN WRITE (`,') after
the WRITE command.
Exercise
If we don't know that N is greater than 100,000, a more complicated
algorithm is needed. Write a program that will print N in one of these
forms:
1,234,567
123,456
12,345
1,234
123
12
1
0
(Solution: make 0 a special case, otherwise remember if there has been a
non-zero digit, supress all printing until a non-zero digit has been
encountered.
Suppose we want to read integers containing commas, ignoring the commas,
and stopping at the first blankspace. Reading into an integer variable
can't handle the task, and we have to read individual characters, building
up the value of the number from the values of the indivdiual characters.
While the ordinal values of the digit characters are not the same as their
numerical values (this is because the digits are not the first characters
in standard alphabetical order), they are consecutive, so we can read a
digit into a CHAR variable and get its numerical value DIGIT by
READ (C);
DIGIT:= ORD(C) - ORD(`O').
We get the value of a multi-digit member like 123456 from the values of its
individual digits by treating it as a polynomial
1 X 10↑5 + 2 X 10↑4 + 3 X 10↑3 + 4 X 10↑2 + 5 X 10 + 6,
or more simply
((((1 X 10 + 2) X 10 + 3) X 10 + 40 X 10+ 5) X 10 + 6.
Now the plan of the algorithm becomes clear. We read the characters of the
number, descarding commas, and keeping track of the numerical value ot the
part ot the number so far seen:
N:=0;
READ(C);
WHILE C <> ` ' DO
BEGIN
IF C <> `,' THEN
N:=N * 10 + ORD(C) - ORD('O');
READ(C);
END;
(*N CONTAINS THE NUMBER READ*)
(a blank is a zero.
(but ,,,,3,,5,0 is legal.
_Exercise_
The subprogram above fails if the number read is close to the mazimum value
MAXINT. Revise it so it can read any positive integer up to MAXINT.
_Exercise_
Revise it to detect an attempt to read a number larger than MAXINT. Why is
IF N > MAXINT THEN WRITE (`TOO BIG")
not a good approach?
_Exercise_
Revise the reading algorithm to read integers written in base 16, with the
letters A through F serving as digits with values 10 through 15.
_Algorithms_
Originally, the word algorithm (in its older form, "algorism") meant a
method for doing arithmetic using Arabic numerals; the word came from the
name of al-Khowarizmi, a ninth century Arabic mathematician and textbook
author [Knuth]. It has come to mean any completely specified rule for
processing information. Carrying out an algorithm requires no imagination,
nor any understanding of the significance of the computation. Before the
advent of the computer, huge teams of clerks were used to calculate tables
of logarithms and trigonometric functions, tables giving correct angles for
aiming artillery taking account of distance and wind, etc. Probably few of
them understood the reasons for the algorithms. Now we tend to relegate
such algorithms to electronic computers, as we shall see.
Most of us have forgotten how complicated the complete algorithms for
arithmetic are. Here is one such algorithm, to subtract B from A, giving a
result C, where A and B must both be whole numbers (integers), and A must
be greater than B.
(1). Look up the rightmost digits of A and B in Table 1.
Table 1
A\B 0 1 2 3 4 5
---------------------------
0 | 0 9B 8B 7B 6B 5B
|
1 | 1 0 9B 8B 7B 6B
|
2 | 2 1 0 9B 8B 7B
|
3 | 3 2 1 0 9B 8B
|
4 | 4 3 2 1 0 9B
|
5 | 5 4 3 2 1 0
Write down the digit from the table as the leftmost digit of C. Remember
whether there was a B ("borrow") at the table entry. Scratch out the
rightmost digits of A and B. If there was no borrow, go to step 2; If
there was a borrow, go to step 3.
(2). Look up the right most remaining digits of A and B in Table 1. If A
and B are both used up, you are finished. If B is used up, use a zero as
the digit of B. Write down the end of C. Remember whether there was a "B"
at the table entry. Scratch out the rightmost remaining digits of A and
B, if any.
If there was no borrow, start step 2 again; If there was a borrow, go to
step 3. Step 3 is just like step 2, except that it uses Table 2, not Table 1.
Table 2
A\B 0 1 2 3 4
----------------------
0 | 9B 8B 7B 6B 5B
|
1 0 9B 8B 7B 6B
|
2 | 1 0 9B 8B 7B
|
3 | 2 1 0 9B 8B
|
4 | 3 2 1 0 9B
A B C Borrow
Initially 9053 72 nothing
After Step 1 905 7 1 No
After Step 2 90 gone 81 Yes
After Step 3 9 gone 981 Yes
After Step 3 gone gone 8981 No
Finished.
A Case History: The Subtraction Algorithm at work.
Most of you, at some time in your lives, learned to count. In fact, you probably
learned to count starting with any given number; you had an algorithm which,
given a number, would find the number one greater. For small numbers, you could
do this in your head without strain.
When you learned to add, part of the problem was correct handling of carries.
When you add the hundreds digits of two numbers, by mentally looking the pair up
in a table of the sums of one-digit numbers, you must also remember whether there
was a carry (always a 1, if so) fromthe tens digits. If there was a carry, you
use your counting algorithm to get the next number larger than the one in the
addition table. In order to carry out the addition algorithm, you must first know
the counting algorithm. In the same way, to multiply, you must know and use an
addition algorithm at several stages. This pattern is frequent in the contraction
of algorithms; the complex ones are based on the use of simpler ones. An algorithm
in another algorithm is called subalgorithm of the other algorithm, or a procedure.
_Exercise_
Write out in full the algorithm you use to perform long division.
The standard algorithm for multiplication uses, at certain steps, an algorithm for
addition to add up the partial products. Algorithms to add and subtract
numbers with a decimal point use an algorithm for lining up decimal points.
The design and documentation of algorithms is often simplified by referring
to already-established algorithms.
Exercise: give algorithm for halving. Observe that to execute such an
algorithm, you must remember your place in the master algorithm while you
carry out the subordinate one.)
Algorithms 2
Algorithms meant to be performed by humans can be expressed in any way that
the humans can understand. Algorithms meant to be performed by computers
must be expressed in a language computers can interpret. Most computers
are designed to interpret algorithms written in a very crude notation
("machine language") that most humans find unsatisfactory as a language in
which to express algorithms. We meet the needs of both the human algorith
designer and the computer interpreting the algorithm by providing a
translation; programmers write algorithms in a language designed for
convenience and expressiveness, the algorithm is translated (by another,
pre-existing computer program) into machine language, and the machine
language version of the algorithm is carried out by the computer. The
programmer need not know anyghing about the machine language, little
programming is done in machine language.
Pascal is a language for writing algorithms, using a misture of English and
mathematical notation. Programs written in Pascal can be translated, by
other programs called ←compilers← or ←translators←, into the machine
languages of most popular computers. Because Pascal is a standard,
programs written in Pascal can be shared among the users of diverse
computers, and can continue to be used without change when obsolete
computers are replaced by newer ones.
Most Pascal translators accept programs in Pascal extended by notations to
make more efficient use of facilites peculiar to a particular computer.
Some translators actually accept programs written in dialects of Pascal.
The International Standards Organization (ISO) has formulated a definition
of Pascal, called Standard Pascal, which is widely accepted; normally,
programs in Standard Pascal can be expected to work with all translators of
recent design. This text uses Standard Pascal almost almost exclusively;
departures to use computer capabilities not expressible in Pascal are
explicitly labeled non-standard.